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I.  INTRODUCTION 


Since  the  end  of  World  War  II  and  up  until  recent  years  the  United  States  Army  has 
conducted  limited  live-fire  tests  of  armored  fighting  vehicles  (AFVs)  to  investigate  the 
interaction  occurring  between  munitions  and  these  vehicles.  The  live-fire  tests  were  con¬ 
ducted  only  occasionally  because  of  extremely  high  costs  cf  resources  necessary  for  those 
tests.  Although  vulnerability  studies  of  AFVs  have  used  the  insights  gathered  from  such  vehi¬ 
cle  tests,  they  have  relied  more  on  mathematical  modeling,  computer  simulation,  live-fire 
tests  of  components,  and  inferences  made  from  firing  at  armor  plate.  The  live-fire  testing  of 
armored  vehicles  has  recently  intensified  involving  a  very  limited  number  of  vehicles  and 
shots.  One  question  which  Army  researchers  wish  to  answer  is  how  well  do  computer  model 
predictions  compare  with  the  results  from  live-fire  field  testing  of  AFVs.  The  answer  to  that 
question  is  the  topic  of  this  report. 

The  outcome  of  a  direct  hit  on  a  target  vehicle  may  be  examined  on  three  different  lev¬ 
els.  We  may  look  at 

1.  the  entire  system  (e.g.,  catastrophic  kill), 

2.  subsystems  (e.g.,  personnel,  fire  control),  and 

3.  components  (e.g.,  projectile  tubes,  propellant  cases). 

If  the  test  results  are  described  as  either  "kill"  or  "no-ldll",  we  have  a  Bernoulli  trial  in  Which 
the  outcome  can  be  one  of  only  two  possible  states.  Vulnerability  estimates  are  expressed  as 
kill  probabilities  (Pk)’s,  which  represent  the  proportion  of  hits  resulting  in  a  kill. 

Recently  a  computer  model  has  been  developed  that  incorporates  randomness  in  its  cal¬ 
culations  so  that  simulated  repeated  firings  at  an  AFV  under  identical  shot  conditions  pro¬ 
duce  varying  degrees  of  destruction.  Through  many  runs  of  the  model,  vulnerability  research¬ 
ers  can  obtain  hypothesized  values  (or  estimates)  of  the  true  Pk’s  for  the  entire  system,  sub¬ 
systems  and  components.  It  would  be  an  experimental  luxmy  to  be  able  to  fire  munitions  at 
hundreds  of  AFVs  under  the  same  shot  conditions  to  see  bow  well  these  hypothesized  values 
from  the  model  replicate  the  live-fire  results.  Due  to  the  destructive  nature  of  the  test  and 
the  cost  of  AFVs,  such  an  experiment  is  economically  infeasible.  Usually  the  same  munition 
or  different  munition  types  are  fired  at  vehicles  under  varying  shot  conditions  with  no  duplica¬ 
tion  of  shots  and  the  experimenter  is  left  to  assess  the  validity  of  computer  based  vulnerability 
estimates  from  the  firing  of  a  single  round.  It  is  impossible  to  statistically  analyze  a 
hypothesized  Pk  on  the  basis  of  one  fired  round.  However,  if  we  look  at  a  group  of  com¬ 
ponents,  for  example,  then  we  can  make  a  statistically  valid  statement  for  the  corresponding 
group  of  Pk’s  if  we  assume  that  the  components  are  independent.  What  is  meant  by  indepen¬ 
dence  is  that  the  outcome  of  any  component  (kill  or  no-kill)  has  no  influence  on  the  probabil¬ 
ity  that  the  other  components  in  the  group  will  be  killed. 
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This  report  details  four  procedures  for  testing  a  group  of  hypothesized  probabilities. 
The  argument  is  presented  that  one  of  the  four  is  the  asymptotically  most  powerM  test  of  the 
possible  procedures.  This  problem  was  first  studied  by  Dr.  J.  Richard  Moore,  formerly  of  the 
US  Army  Ballistic  Research  Laboratory  (BRL),  in  response  to  requests  from  the 
Vulnerability/Lethality  Division  (VLD)  of  BRL,  The  author  joined  Dr.  Moore  in  his 
research  in  1986.  Since  then  VLD  has  used  some  of  the  results  in  examining  computed  esti¬ 
mates,  which  were  calculated  with  a  expected  value  model,  for  consistency  with  observed  test 
results  from  firings  at  AFVs. 


H.  TEST  CONCEPTS 

Assume  that  as  a  result  of  our  computer  simulation,  we  obtain  a  set  of  Pk  estimates. 
Perhaps  they  are  for  a  group  of  components  within  a  subsystem  of  the  AFV.  Denote  this  set 
of  estimates  by  the  vector  [p°,  p2,  •  •  • ,  p°],  where  p°  is  the  estimated  kill  probability  of  the  ith 
component  of  interest  and  /  is  the  number  of  components.  Also,  let  the  true  but  unknown  kill 
probabilities  be  denoted  by  the  vector  [pA,  p2,...,  p,].  If  we  assume  that  the  components  are 
independent,  then  we  may  begin  to  develop  our  test  strategy  by  writing  the  null  hypothesis: 

Ho-  Pi  —  Pl»  P2  —  P2’"-’  P/  —  P/  ■ 

Note  that  while  this  is  similar  to  the  hypothesis  for  the  binomial  test,  one  fundamental 
difference  exists:  We  allow  for  the  p°’s  to  be  unequal  We  call  this  a  test  of  generalized  bino¬ 
mial  proportions.  The  binomial  test  is  a  special  case  of  this,  namely  p{  =  pj,  for  all  i,j. 

If  the  data  do  not  support  the  null  hypothesis,  then  it  is  rejected  in  favor  of  its  converse, 
the  alternative  hypothesis, 

Ha:  Pj  5*Pi°  for  some  l 

The  alternative  hypothesis  states  that  only  one  inequality  has  to  exist;  i.e.,  only  one  estimate 
needs  to  be  incorrect.  However,  because  the  analysis  is  based  upon  as  little  as  one  round, 
gross  inequalities  are  needed  before  a  procedure  will  be  able  to  reject  the  null  hypothesis  with 
satisfactory  power. 

Suppose  we  observe  a  set  of  l  independent  Bernoulli  outcomes  from  the  live  fire  testing 
(denoted  by  0  or  1,  corresponding  to  no-kill  or  kill,  respectively),  and  write  them  in  the  form 
of  a  row  vector  A  =  [aj,  a^, ...,  a,].  For  example,  if  1=5,  we  may  observe  A  =  [0,1, 0,0,1]. 
There  are  2 !  possible  outcome  vectors  Ap  Ay  ...,  A^,  which  we  collectively  define  to  be  R., 
Any  test  of  the  null  hypothesis  requires  a  measure  of  performance  (MOP)  for  each  of  the  2 
outcomes  and  some  ordering  of  the  measure.  At  this  point  we  branch  our  discussion  into 
four  different  MOP’s  and  thus  four  different  testing  procedures. 
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III.  PROCEDURE  1  -  THE  ORDER  BY  PROBABILITY  (OP)  PROCEDURE 


This  procedure  rejects  the  null  hypothesis  if  the  observed  vector  is  among  a  defined  crit¬ 
ical  set  of  "rarest"  outcomes.  The  MOP  for  the  procedure  is  simply  P(A),  the  probability  with 
which  outcome  A  occurs  assuming  our  hypothesized  probabilities  Pp  p2, p, .  The  outcome 
set,  ft,  is  ordered  by  P(A)  in  increasing  magnitude,  and  each  outcome  is  numbered  so  that 
A^  is  the  least  likely  outcome  and  A^  is  most  likely.  We  then  define  a  cumulative  function 
B,  where 


P(A(i))  i=1 

Bi-i  +  P(A(i)>  i=2>3’4 . 1 


We  pick  a  desired  level  of  significance,  a,  and  find  "c"  such  that  c  =  max  {jpBj  <  a  and 
P(A(j))  ^  p(A(j+i))}-  Then  the  set  RRop  =  (A^,  ...,  A,CA  represents  the  c  rarest  out¬ 

comes  in  ft  and  is  the  rejection  region  for  the  test  of  H0  at  a  i00a%  level  of  significance.  The 
"test  statistic"  is  the  observed  outcome  vector  A;  if  A  €  RRop,  then  H0  is  rejected. 


IV.  PROCEDURE  2 -THE  KILLS  TEST 

This  test  uses  for  its  MOP,  the  number  of  kills  (l’s)  observed.  The  underlying  notion  is 
that  under  the  null  hypothesis,  a  certain  number  of  kills  is  expected.  Letting  K(A)  denote  the 
number  of  kills  in  our  observed  outcome  vector  A,  then  the  expected  value  of  K(A)  is 

E[K(A)1  =  Pl°  +  p2°  +  ...  +  p,° 


=  E  Pi°- 

i=i 

If  the  observed  K(A)  is  much  smaller  than  E[K(A)j,  then  perhaps  the  model  estimates 
are  inflated  estimates  of  the  true  kill  probabilities.  Likewise,  if  the  observed  K(A)  is  much 
larger  than  E[K(A)],  then  the  estimated  kill  probabilities  are  probably  too  small. 

To  perform  this  test,  we  begin  by  calculating  K(A)  and  P(A)  for  ail  2/  outcomes.  The 
outcomes  are  then  ordered  in  increasing  magnitude  by  K(A)  and  numbered,  so  that 

K(A(1))  <  K(A(2))  <  •  •  •  <K(A(2,}). 


3 


The  order  among  outcomes  with  equal  K(A)  is  irrelevant.  Similar  to  the  OP  procedure  the 
"cumulative  function"  is  calculated.  Since  rejecting  H0  may  be  due  to  either  too  small  or  too 
large  a  value  of  K(A),  a  two-tailed  test  is  used.  Critical  values  c:  and  c2  are  selected  so  that 
the  actual  alpha  level 

PtKCA^cJ  +  PfKtA)^] 

is  maximized  but  still  less  than  or  equal  to  a.  The  rejection  region  for  this  test  is 
RRk  =  (A|K(A)  e  {0,1,  •  •  •  cx}  U  {c2,C2+l,  •  •  •  /}}.  The  model  estimates  will  be  rejected 
as  inconsistent  with  the  field  tests  if  A  G  RRK. 

V.  PROCEDURE  3  -  THE  MORE-LIKELY  RESPONSE  (MLR)  TEST 

This  test  examines  the  number  of  more-likely,  or  "correct"  responses  where  a  more- 
likely  response  is  defined  as 

'1  if  at  =  0  when  p°  <  .5,  or  if  at  =  1  when  p®  >  .5 

Ti  =  -5  ifpj°  =  .5 
0  otherwise 

In  other  words  a  more-likelv  response  is  the  response  which  we  expect  to  see  more  often  than 
not  in  the  long  run.  So  if  p{  =  .8  we  would  expect  to  observe  a  kill  more  often  than  a  no-kill. 
If  a;  =  1,  a  kill,  then  =  1  and  the  observed  response  is  considered  "correct".  When  pf°  =  .5, 
we  are  essentially  saying  that  we  have  no  inclination  as  to  which  response  is  more  likely. 
Therefore  we  compromise  and  always  assign  ^  =  .5. 

The  MOP  is  the  total  number  of  correct  responses 
M(A)  =  7i  +  12  +  '  *  ’  +  'll 

i 

= 

i=l 

The  reasoning  behind  this  procedure  is  that  if  we  observe  an  unusually  low  number  of 
more-likely  responses,  then  our  model  estimates  are  too  large  when  they  should  be  smaller 
and/or  too  small  when  they  should  be  larger.  We  also  note  that  it  is  possible  to  observe  too 
many  correct  responses.  This  would  tend  to  indicate  that  our  large  estimates  (p°  >  .5)  are 
not  large  enough  and/or  that  our  small  estimates  (p®  <  .5)  are  not  small  enough. 

The  expected  value  of  M(  A)  is 
E[M(A)j  =  ML  +  S*/2  + 
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where 


ml  =  E(1-p/))  forallpj0  <.5 
j 

Mu=EPj°  for allpj0  >.5 
j 

S  =  number  of  p°  equal  to  .5 

We  start  by  calculating  M(A)  and  P(A)  for  all  possible  outcomes.  The  outcomes  are 
arranged  in  increasing  magnitude  by  M(A)  without  regard  for  ties  so  that 

M(A(1))  <  M(A(2))  <  •  •  •  <  M(A(2()) 


The  cumulative  function  is  computed  as  usual.  Since  obtaining  a  value  of  M(A)  much 
smaller  or  larger  than  the  expected  value  leads  us  to  believe  that  H0  is  false,  a  two-tailed  test 
is  desired.  Critical  values  ct  and  are  selected  as  in  the  Kills  test  to  maximize  the  actual 
alpha  level.  The  rejection  region  becomes  RRmlr  =  {A|M(A)  e  {0,1,  •  •  •  cj 
U  {c2,  c2  +  1,  •  •  •  /}  },  and  we  will  reject  H0  at  the  a  level  of  significance  if  A  e  RRmlr-  In 
practice,  though,  c2  will  usually  not  exist  and  a  one-tailed  test  will  be  used  instead. 


VI.  PROCEDURE  4  -  THE  SQUARED  DISTANCE  MEASURE  (SDM)  TEST 

This  test  involves  the  calculation  of  a  "squared  distance  measure"  for  each  component  of 
the  outcome  vector.  The  SDM  is  (p;°  -  a;)2.  Squaring  assures  that  all  values  are  positive  so 
that  each  component  produces  an  additive  effect;  it  also  increases  the  "penalty"  for  responses 
which  are  very  far  from  p®.  Note  that  the  SDM  for  any  given  component  must  lie  in  the 
interval  [0,1];  and  the  two  values  SDM  may  take  tin  are  more  extreme  the  nearer  to  0  or  1  p° 
is.  The  SDM  acts  as  a  penalty  function.  As  p°  approaches  0  (or  1),  the  penalty  associated 
with  being  incorrect  is  greater.  If  p°  is  close  to  .5  (i.e.,  we  have  less  confidence  in  our  ability 
to  predict  aj),  then  the  penalty  for  an  incorrect  response  is  not  much  different  than  the  SDM 
for  a  correct  response.  The  MOP  is  simply  the  sum  of  the  SDM’s, 

S(A)  =  (p°  -  a1)2  +  (p2°  -a^2  +  •  •  •  +  (p°  -  a/ 

=  E(Pi°-ai)2 
i=l 

The  expected  value  of  S( A)  is 

/ 

E[S(A)]  =  E  Pi°  (1  -  Pi°) 

i=l 
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Again,  we  calculate  S(A)  and  P( A)  for  each  of  the  2/  outcomes,  and  arrange  them  in  decreas¬ 
ing  magnitude  by  S(A)  with  no  regard  for  ties  so  that 

S(A^i))  >  S(A(2))  >  ■  ■  •  >  S(A^) 

The  Bj’s  are  computed  in  the  usual  fashion.  We  would  tend  to  believe  that  H0  is  false  if 
S(A)  is  too  large,  therefore  a  one-tailed  procedure  is  used.  Given  alpha,  we  select  c  which 
satisfies 

c  =  max{j|Bj<aandS(A(j))  ?tS(A(.+1))}. 


The  set  of  outcomes  RRS  =  {A|S(A)  >  S(A^)}  represents  the  rejection  region  for  our  test  of 
H0.  Therefore  if  S(A)  >  S(A^c))  we  reject  H0  at  the  a  level  of  significance. 


VII.  AN  ILLUSTRATIVE  EXAMPLE 


Assume  that  the  model  estimates  of  kill  probabilities  for  five  independent  tank  com¬ 
ponents  are  as  follows: 


A  =  [.23,  .64,  .19,  .91,  .70] 

Figure  1  shows  each  of  the  25  =  32  possible  vector  outcomes  along  with  their  associated 
P(Aj),  K(Aj),  M(Aj),  and  S(Ai).  The  outcomes  are  ordered  by  a  binary  counting  scheme. 
The  OP  procedure  is  illustrated  in  Figure  2.  Note  that  the  vectors  are  now  ordered  by  then- 
probability  of  occurrence.  The  rejection  region  for  an  a  =  .05  level  of  significance  is  all  the 
outcomes  above  the  line.  Figure  3  shows  the  Kills  test  ordering  scheme  and  resultant  two- 
tailed  rejection  region  outside  the  two  lines.  Note  the  additional  columns  P[K(A^)]  and 
B[K(A^)].  Since  our  test  statistic  is  K(A),  vectors  having  an  equal  number  of  kills  are  indis¬ 
tinguishable.  Therefore  P[K(A^)]  represents  the  probability  of  getting  K(A^)  kills  and 
B[K(A^J]  represents  the  cumulative  probability  for  the  same  number  of  kills.  In  Figure  4, 
the  MLR  test  is  shown.  Although  a  two-tailed  procedure  can  be  used,  the  rejection  region 
only  includes  a  lower  tail  of  six  vectors.  This  is  because  the  vector  with  M(A^32p  =  5  has  a 
probability  mass  greater  than  alpha.  The  columns  P[M(A,~)]  and  B[M(A^)]  are  analogous  to 
the  additional  columns  of  Figure  3.  We  see  the  SDM  test  in  Figure  5.  It  has  a  rejection 
region  of  13  vectors  containing  the  largest  values  of  S(A^).  Note  that  B14  <  a,  however  A^14^ 
is  not  in  the  rejection  region.  This  is  because  S(A^14p  =  S(A^)  and  B15  >  a.  Recall  that  in 
each  of  the  tests,  outcomes  with  equal  MOP’S  are  considered  indistinguishable.  If  we  had 
allowed  A^  €  RRS  and  A^  £  RRS  then  we  would  be  violating  the  rule  by  differentiating 
between  two  outcomes  with  tne  same  SDM,  Figure  6  summarizes  the  rejection  regions  of  the 
four  procedures,  with  OP  having  the  largest  region  and  the  kills  test  having  the  smallest. 
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Figure  1.  Hypothetical  5-component  example:  All  possible  outcomes  and  measures  of 
performance. 
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The  hypothesized  probabilities  are: 
[0.23, 0.64, 0.19, 0.91, 0.70] 


Vector 

Prob. 

Cum.Prob. 

% 

®i 

10100 

0.00042 

0.00042 

11100 

0.00076 

0.00118 

10101 

0.00099 

0.00217 

00100 

0.00142 

0.00359 

11101 

0.00176 

0.00536 

10000 

0.00181 

0.00717 

01100 

0.00253 

0.00969 

11000 

00101 

0.00322 

0.00332 

10001 

0.00423 

0.02046 

10110 

0.00429 

0.02475 

01101 

0.00590 

0.03065 

00000 

0.00606 

0.03671 

11001 

0.00751 

0.04422 

11110 

0.00764 

0.05186 

10111 

0.01002 

0.06188 

01000 

0.01078 

0.07266 

00001 

0.01415 

0.08680 

00110 

0.01438 

0.10118 

1111 1 

0.01782 

0.11900 

10010 

0.01831 

0.13731 

01001 

0.02515 

0.16246 

01  1 10 

0.02556 

0.18802 

11010 

0.03255 

0.22057 

00111 

0.03355 

0.25412 

10011 

0.04272 

0.29684 

01111 

0.05964 

0.35648 

00010 

0.06130 

0.41778 

11011 

0.07595 

0.49373 

01010 

0.10897 

0.60270 

00011 

0.14303 

0.74573 

01011 

0.25427 

1.00000 

Figure  2.  Hypothetical  5-component  example:  Summary  of  Order  by  Probability 
(OP)  Procedure. 
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The  hypothesized  probabilities  are: 
[0.23, 0.64,  0.19, 0.91, 0.70] 


Vector 

Kills 

Probability 

Cumulative  Probability 

i 

K(A(i)) 

P(A«) 

?[*(%)] 

bekca^)] 

m 

00000 

0 

0.00606 

0.00606 

0.00606 

0.00606 

urn 

00001 

1 

■iMin 

0.02021) 

3 

00010 

1 

0.08151 

4 

00100 

1 

0.08945 

0.08293) 

0.09552 

5 

01000 

1 

0.09371 1 

6 

10000 

1 

0.09552 ) 

0001  1 

2 

0.23855 

00101 

2 

0.24186 

00110 

2 

0.25624 

10 

01001 

2 

0.28139 

11 

12 

01010 

01100 

2 

2 

>  0.32355 

0.39036 

039289 

>  0.41906 

13 

10001 

2 

0.39711 1 

14 

10010 

2 

0.41542 

15 

10100 

2 

0.41585 

16 

11000 

2 

0.41906 

17 

00111 

3 

0.45262 

\ 

18 

01011 

3 

0.70689 

I 

19 

01101 

3 

0.71278 

1 

20 

01  110 

3 

0.73835 

f 

21 

10011 

3 

0.40811 

0.78107 

>  0.82717 

22 

10101 

3 

0.78206 

23 

10110 

3 

0.78635 

24 

11001 

3 

0.79387 

25 

11010 

3 

0.82642 

26 

11100 

3 

0.82717  J 

27 

01111 

4 

0.88682, 

28 

10111 

4 

29 

non 

4 

0.15501 

0.98218 

30 

11101 

4 

31 

11110 

4 

32 

11111 

5 

0.01782 

0.01782 

1.00000 

1.00000 

Figure  3.  Hypothetical  5-component  example:  Summary  of  Kills  test. 
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The  hypothesized  probabilities  are: 
[0.23, 0.64, 0.19, 0.91, 0.70] 


Vector 

K I | 

Probability 

Cumulative  Probability 

i 

m(%)] 

»i 

BtMCA^)] 

1 

10100 

0 

0.00042 

0.00042 

0.00042 

0.00042 

2 

00100 

1 

0.00142 

0.00185 

3 

10000 

1 

0.00181 

0.00366 

4 

10101 

1 

0.00099 

0.00927 

0.00465 

0.00970 

5 

10110 

•4 

i 

0.00429 

0.00894 

11100 

1 

0.00076 

0.00970 

7 

00000 

2 

0.00606) 

0.01576] 

8 

00101 

2 

0.00332 

0.01908 

9 

00110 

2 

0.01438 

0.03346 

10 

01100 

2 

0.00253 

0.03599 

11 

10001 

2 

0.00423 

>  0.07146 

0.04021* 

y  0.08116 

12 

10010 

2 

0.01831] 

0.05852, 

f 

13 

10111 

2 

0.01002 1 

0.06854 

14 

11000 

2 

0.00322 1 

0.07176 

15 

11101 

2 

0.00176 

0.07352 

16 

11110 

2 

0.00764 

0.08116) 

17 

00001 

3 

0.01415  ’ 

0.09530 

18 

00010 

3 

0.06130 

0.15660 

19 

00111 

3 

0.03355 

0.19015 

20 

01000 

3 

0.01078 

0.20093 

21 

22 

01101 

OHIO 

3 

3 

0.00590  ‘ 
0.02556  j 

\  0.25183 

0.20683 

0.23239 

>  0.33299 

23 

10011 

3 

0.04272] 

0.27511 

24 

11001 

3 

0.00751 

0.28262 

25 

11010 

3 

0.03255 

0.31517 

26 

11111 

3 

0.01782 

0.33299) 

27 

00011 

4 

0.14303 

0.47602 

28 

01001 

4 

0.02515, 

0.50117 

29 

01010 

4 

0.10897 

0.41274 

0.61014 

0.74573 

30 

01111 

4 

0.05964 

0.66978 

31 

11011 

4 

0.07595 

0.74573 

32 

01011 

5 

0.25427 

0.25427 

1.00000 

1.00000 

Figure  4.  Hypothetical  5-component  example:  Summary  of  More-Likely  Response  (MLR) 
test. 
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The  hypothesized  probabilities  are: 
[0.23, 0.64, 0.19, 0.91,  0.70] 


Vector 

Probability 

Cumulative  Probability  1 

% 

P<A®> 

PfSCAfl)] 

Bi 

1 

10100 

2.9767 

0.00042 

0.00042 

0.00042 

0.00042 

2 

11100 

2.6967 

0.00076 

0.00076 

0.00118 

0.00118 

3 

10101 

2.5767 

0.00099 

0.00099 

0.00217 

0.00217 

4 

00100 

2.4367 

0.00142 

0.00142 

0.00359 

0.00359 

5 

10000 

2.3567 

0.00181 

0.00181 

0.00540 

0.00540 

6 

11101 

2.2967 

0.00176 

0.00176 

0.00716 

0.00717 

7 

01100 

2.1567 

0.00253} 

0.00682 

0.00969 

0.01399 

8 

10110 

2.1567 

0.00429( 

0.01399 

9 

11000 

2.0767 

0.00322 

0.00322 

0.01721 

0.01721 

00101 

2.0367 

0.00332 

0.00332 

0.02053 

0.02053 

11 

10001 

1.9567 

0.00423 

0.00423 

0.02475 

0.02475 

12 

11110 

1.8767 

0.00764 

0.00764 

0.03239 

0.03239 

13 

00000 

1.8167 

0.00606 

0.00606 

0.03845 

0.03845 

14 

01101 

1.7567 

0.00590/ 

0.01592 

.BBI 

15 

10111 

1.7567 

0.01002! 

,  • 

16 

11001 

1.6767 

0.00751 

0.00751 

17 

00110 

1.6167 

0.01438 

0.01438 

18 

01000 

1.5367 

0.01078) 

0.02909 

0.08704 

0.10535 

19 

10010 

0.01831  j 

0.10535 

20 

mil 

1.4767 

0.01782 

0.01782 

0.12316 

0.12316 

21 

00001 

0.01415 

0.01415 

0.13731 

0.13731 

22 

011  10 

0.02556 

0.02556 

0.16287 

0.16287 

23 

11010 

1.2567 

0.03255 

0.03255 

0.19542 

0.19542 

24 

00111 

1.2167 

0.03355 

0.03355 

0.22897 

0.22897 

25 

01001 

1.1367 

0.02515} 

0.07287 

0.25412 

0.29684 

26 

10011 

1.1367 

0.04272  ( 

0.29684 

27 

00010 

0.9967 

0.06130 

0.06130 

0.35814 

0.35814 

28 

01111 

0.9367 

0.05964 

0.05964 

0.41778 

0.41778 

29 

11011 

0.8567 

0.07595 

0.07595 

0.49373 

0.49373 

30 

01010 

0.7167 

0.10897 

0.10897 

0.60270 

0.60270 

31 

00011 

0.5967 

0.14303 

0.14303 

0.74573 

0.74573 

32 

01011 

0.3167 

0.25427 

0.25427 

1.00000 

1.00000 

Figure  5.  Hypothetical  5-component  example:  Summary  of  Squared  Distance  Measure 
(SDM)  test 
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Order  by  Probability  (OP)  Procedure  —  RR  OP  (14  outcomes) 


10100 

11100 

10101 

00100 

11101 

10000 

01100 

11000 

00101 

10001 

10110 

01101 

00000 

11001 

Kills  Test  --  RR  K  (2  outcomes) 

00000 

11111 

More-Likely  Response  (MLR)  Test  —  RR  MLO  (6  outcomes) 

10100 

00100 

10000 

10101 

10110 

11100 

Squared  Distance  Measure  (SDM)  Test  -  RR  SDM  (13  outcomes) 

10100 

11100 

10101 

00100 

10000 

11101 

01100 

10110 

11000 

00101 

10001 

11110 

00000 

Figure  6.  Hypothetical  5-component  example:  Rejection  regions  for  each  procedure. 
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VIII.  PROCEDURE  COMPARISONS 


To  study  the  four  procedures,  2000  pairs  of  /-dimensional  probability  vectors  were  ran¬ 
domly  generated  for  /  =  5,  6,  7,  8,  9,  and  10.  The  first  vector  of  a  pair  (h0  h*A)  was  coasidered 
the  hypothesized  probability  vector  and  the  second  was  considered  the  alternative  probability 
vector.  The  level  of  significance  was  set  at  a  =  .05.  The  power  of  each  test  (i.e.,  the  proba¬ 
bility  of  rejecting  H0  when  HA  is  true)  was  computed  for  each  pair  (hQ,  hA). 

Figure  7  shows  a  graphic  way  of  comparing  the  power  of  two  test  procedures,  call  them 
A  and  B.  For  a  given  pair  of  vectors  (h0,  h^),  we  compute  the  ordered  pair  (/?A,  /?B)  where 
PA  and  are  the  powers  of  A  and  B  respectively.  Then  the  scatterplot  of  all  2000  points, 
(PA,  0B),  will  give  us  a  comparison  of  the  two  tests.  If  Procedure  A  is  more  powerful  than 
Procedure  B,  then  we  expect  to  see  a  graph  similar  to  Figure  7(A).  If  the  opposite  is  true,  the 
plot  will  be  similar  to  Figure  7(B).  But  if  both  procedures  have  approximately  the  same 
power,  then  Figure  7(C)  is  the  proper  scatterplot. 

Comparisons  of  the  four  procedures  consistently  show  the  OP  procedure  to  be  the  most 
powerful  (See  Figures  8  and  9).  The  SDM  test  appears  to  be  only  slightly  less  powerful.  The 
MLR  and  Kills  tests  both  showed  poor  ability  to  reject  H0  when  other  Fs  were  used. 

These  findings  are  reinforced  when  the  median  power  of  each  procedure  is  computed. 
In  Figure  10,  we  see  again  that  OP  slightly  outpowers  SDM,  with  MLR  and  Kills  exhibiting 
less  power.  It  is  impossible  to  tell  for  certain  which  of  the  four  procedures  is  best  unless  we 
know  "hA.  But  from  the  strictest  viewpoint  in  which  we  assume  no  prior  knowledge  of  the  Pj’s, 
this  is  not  the  case.  When  we  do  not  know  any  information  about  h^,  we  must  assume  that  all 
possible  h^’s  are  equally  likely.  Therefore  it  makes  sense  to  pick  that  procedure  with  the 
greatest  number  of  outcomes  in  its  rejection  region. 


IX.  THE  FISHBOWL  ARGUMENT 

Assume  that  the  null  hypothesis  we  are  interested  in  testing  is  one  that  completely 
defines  the  distribution  of  the  outcome  space  H.  For  example,  our  illustrative  example  from 
Figures  1-6  is  concerned  with  the  null  hypothesis 

H0:  pj  =  .23,  p2  =  .64,  p3  =  .19,  p4  =  .91,  p5  =  .70. 

Given  the  estimated  probabilities  in  this  hypothesis,  P(A;)  can  be  calculated  for  all  possible 
outcomes.  Another  null  hypothesis  that  we  may  be  interested  in  is: 

H0:  pj  =  .23,  p2  =  .64,  p3  =  .19,  p4  =  p5 

Note  that  this  does  not  contain  all  the  probability  estimates  needed  to  compute  P(A),  how¬ 
ever  it  is  certainly  a  valid  hypothesis.  We  will  define  a  simple  null  hypothesis  to  be  one  that 
completely  defines  the  distribution  of  the  outcome  space,  and  denote  it  by  Hq. 
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Median  Power 


OP  Procedure 
Kills  Test 
MLR  Test 
SDM  Test 


/ 


Figure  10.  Median  power  of  the  four  candidate  procedures. 


Now  make  the  additional  assumption  that  an  experiment  has  a  finite  outcome  space,  fi. 
If  we  are  interested  in  testing  some  simple  null  hypothesis  at  the  a  level  of  significance,  how 
many  different  ways  can  we  perform  a  test  of  and  which  is  the  optimal  way? 

To  attempt  to  answer  these  questions  let  fi  be  of  size  N,  m  <  N,  and  {Oj,  02,  •  •  •  Om} 
be  any  subset  of  fi  such  that  under  H^, 

P(01)  +  P(02)+  •••  +  P(Om)  <  a. 


Then  we  claim  that  {0lt  02,  •  •  ■  Om}  is  a  rejection  region  for  some  test  of  Hq.  Why? 
Because  under  Hq,  the  chance  of  observing  an  outcome  from  this  subset  is  less  than  or  equal 
to  alpha,  our  desired  level  of  significance.  Therefore  we  have  the  foundations  of  a  statistical 
test,  even  if  the  reasoning  behind  the  selection  of  the  subset  is  not  specified. 

To  help  explain  this  concept,  Figure  11  shows  an  example  of  an  outcome  set  with  N=  16. 
Each  circle  represents  one  of  the  16  possible  outcomes  and  its  size  is  proportional  to  the  den¬ 
sity  of  the  outcome  under  the  simple  null  hypothesis.  In  Figure  12,  each  group  of  circles  (out¬ 
comes)  connected  by  a  horizontal  line  symbolizes  a  subset  satisfying  our  condition  (i.e., 
a.  <  .05)  to  be  a  rejection  region  for  some  test  of  the  simple  null  hypothesis.  The  probability 
of  observing  an  outcome  from  each  subset  is  indicated  by  the  number  in  the  right  column. 
Note  that  these  values  (which  are  computed  by  summing  the  probabilities  of  the  outcomes  in 
the  subset)  are  all  less  than  or  equal  to  .05,  the  desired  alpha  level,  and  that  the  addition  of 
any  other  outcome  to  each  set  makes  the  new  sum  greater  than  .05.  We  therefore  consider 
each  of  these  24  subsets  a  rejection  region  to  test  Hq. 

For  each  rejection  region,  the  probability  of  observing  an  outcome  in  that  region  is  at 
most  a  under  the  simple  null  hypothesis.  However,  if  some  alternative  hypothesis  is  true,  the 
probability  of  observing  an  outcome  in  the  rejection  region  (thereby  correctly  rejecting  Hq)  is 
some  other  value  1-/3 ,  which  we  call  the  power  of  the  test.  Unfortunately  the  power  is  unk¬ 
nown  to  us  if  we  do  not  know  which  alternative  hypothesis  is  true.  At  best,  we  can  only  say 
that  all  alternative  hypotheses  are  equally  likely.  Therefore  each  outcome  in  a  rejection 
region  is  equally  likely  to  occur,  and  the  optimal  rejection  region  is  that  one  which  contains 
the  most  outcomes.  The  way  to  build  this  rejection  region  is  to  include  the  least  likely  out¬ 
comes  until  no  more  can  be  added.  In  Figure  12,  the  star  labels  the  rejection  region  that  we 
would  use  since  it  contains  six  outcomes,  more  than  any  other  rejection  region. 

As  an  analogy,  assume  you  are  given  a  small  fishbowl  partially  filled  with  water  and  a 
large  number  of  pebbles  with  which  to  completely  fill  it.  Also  assume  that  each  pebble  has  a 
different  volume.  If  you  were  instructed  to  raise  the  water  level  to  the  top  of  the  fishbowl  by 
adding  as  many  pebbles  as  possible,  how  would  you  set  out  to  do  so?  Instead  of  occupying 
space  with  one  large  pebble,  you  would  fill  the  same  space  with  smaller  pebbles.  Therefore 
you  would  begin  by  selecting  the  smallest  pebble  and  putting  it  in  the  bowl.  Then  you  would 
drop  in  the  second  smallest  pebble.  The  third  pebble  would  be  the  next  smallest,  and  so  on 
until  the  water  level  reaches  the  brim.  The  remaining  pebbles  would  of  course  be  the  largest 
ones. 
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Figure  11.  Sample  outcome  space  with  events  drawn  proportional  to  density. 
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Figure  12.  All  possible  5%  rejection  regions  for  sample  outcome  space. 
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The  OP  procedure  uses  this  "fishbowl"  technique  by  filling  up  the  rejection  region  with 
those  outcomes  having  the  smallest  probabilities.  The  only  restriction  to  the  technique  is  that 
the  last  outcome  entered  into  the  rejection  region  cannot  have  the  same  density  as  any  out¬ 
come  excluded  from  the  region 


X.  FURTHER  NOTES  AND  RECOMMENDATIONS 

The  OP  procedure  was  only  tested  for  /  =  5,  6,  7,  8,  9  and  10,  for  two  reasons.  Firstly, 
the  data  that  spawned  this  research  was  only  for  small  /,  namely  /  <  10.  Secondly,  the  compu¬ 
tational  time  and  storage  needed  to  compute  the  P(A)’s,  B’s,  etc.  grows  nearly  exponentially 
with  each  unit  increase  in  l.  Simulations  using  l  =  12  were  attempted  but  ran  non-stop  for  a 
couple  of  days  on  a  Gould  9050  minicomputer.* 

Since  the  SDM  tests  does  a  good  job  of  mimicking  the  OP  procedure  it  may  be  an  easier 
test  to  use  when  /  is  larger  than  10,  if  the  distribution  of  S(A)  can  be  approximated.  Initial 
attempts  to  find  such  an  approximation  were  not  successful.  A  listing  of  the  computer  pro¬ 
gram  is  given  in  the  Appendix  at  the  end  of  this  report. 


XL  CONCLUSIONS 

This  problem  is  complicated  by  the  fact  that  we  must  judge  the  entire  set  of  computer 
generated  estimates  on  a  single  shot  It  must  be  admitted  that  while  OP  is  the  best  procedure 
of  those  studied,  occasionally  H„  was  not  rejected  although  the  alternative  hypothesis  differed 
greatly  from  it  Great  care  must  be  taken  in  interpreting  the  final  decisioa  In  rejecting  H0 
we  can  confidently  say  that  the  set  of  hypothesized  kill  probabilities  is  incorrect.  However, 
venturing  to  say  which  components  are  incorrect  and  by  how  much  is  dangerous.  It  is  vital  to 
remember  that  we  are  trying  to  make  inferences  from  one  round.  If  we  do  not  reject  H0, 
then  this  does  not  allow  us  to  "accept  H0  as  being  true".  It  simply  says  that  there  is  not 
enough  evidence  to  say  that  H0  is  false.  We  cannot  validate  the  estimates,  we  can  only  state 
that  they  are  consistent  with  the  live  fire  results. 

We  must  take  care  to  see  that  our  assumption  of  independent  components  is  met.  All 
the  calculations  involved  in  the  OP  procedure  are  made  under  these  assumptions.  Therefore 
the  selection  of  components  is  critical,  and  we  should  avoid  including  incendiary  components, 
shielded  components,  etc.,  in  the  analysis. 

The  OP  procedure  works  best  of  the  four  tried  because  it  does  not  lose  any  information 
by  collapsing  the  data  into  a  univariate  test  statistic.  It  simply  creates  that  rejection  region 
with  the  most  outcomes. 


•  Lawrence  D.  Losie  of  the  Ballistic  Research  Laboratory  has  made  recommendations  for  improving  the  computational  efficiency  of  the 
OP  procedure.  This  work  is  unpublished  but  may  be  obtained  through  private  communication  with  Mr.  Losie. 
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TABLE  OF  SYMBOLS 


ai 

B[K(A(i))] 

B[M(A  )] 
B[S(A  )] 
Bj 

c 

C, 

C2 

E 

Sa 

hA 

«o 


K(°A) 

/ 

M(A) 

MLR 

MOP 

Ml 

My 

Oi 

OP 

P(A) 

P[K(A  )] 
P[M(A  )] 
P[S(A(i))] 
P, 


Pi 

RR 

* 

S 

SDM 

S(A) 

n 


vector  of  observed  outcomes 

ith  ordered  vector  of  observed  outcomes 

armored  fighting  vehicle 

ith  component  of  vector  A 

level  of  significance 

probability  of  observing  that  number  of  kills  (or  less)  associated  with  the  1 
ordered  vector  A^ 

probability  of  observing  that  MLR  value  (or  less)  associated  with  the  ith  ordered  vector  A^ 

probability  of  obsrving  that  SDM  value  (or  less)  associated  with  the  ith  ordered  vector  A(i) 

cumulative  function  value  of  vector  A^ 

power  of  some  test  procedure  A 

critical  value  for  one-sided  rejection  region 

lower  critical  value  for  two-sided  rejection  region 

upper  critical  value  for  two-sided  rejection  region 

more  likely  response  value  for  ith  component  of  vector  A 

expected  value  operator 

alternative  hypothesis 

vectors  of  alternative  probabilities 

null  hypothesis 

vector  of  hypothesized  probabilities 

null  hypothesis  which  completely  defines  the  distribution  of  the  outcome  space 

number  of  kills  in  vector  A 

number  of  components 

number  of  "more-likely-responses"  in  vector  A 

more-likely-response 

measure-of-performance 

expected  number  of  non-kills  for  the  group  of  components  whose  estimated 
probability  of  kill  is  less  than  one-half 

expected  number  of  kills  for  the  group  of  components  whose  estimated 

probability  of  kill  is  greater  than  one-half 

an  element  of  the  outcome  space  n 

order-by-probability 

probability  of  vector  A 

probability  of  observing  that  number  of  kills  associated  with  the  1  ordered  vector  A^ 
probability  of  observing  that  MLR  value  associated  with  the  ith  ordered  vector  A^ 
probability  of  observing  that  SDM  value  associated  with  the  ith  ordered  vector  Aq 
probability  of  kill 

true  probability  of  kill  for  ith  component 
estimated  probability  of  kill  for  1  component 
rejection  region 

number  of  estimated  probabilities  equal  to  one-half 
squared-distance-measure 
squared-distance-measure  for  vector  A 
se't  of  all  possible  outcomes 
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APPENDIX 


c  FILE ;  vul . f 
c 

c  This  program  takes  a  vector  of  k  probabilities  of  0,1  outcomes, 
c  enumerates  all  possible  outcome  vectors  and  calculates  the 
c  probability  of  each,  using  the  vector  of  outcome  probabilities 
c  given.  It  then  sorts  each  of  the  outcome  vectors  according  to 
c  their  probability  of  occurence.  It  calculates  and  prints  the 
c  cumulative  probability, 
c 

c  k  <  13,  is  the  dimension  of  the  vector, 
c  p(i),  i=l,k  is  the  vector  of  input  probabilities, 
c  jout(i,j)  is  the  2*#k  by  k  matrix  of  possible  outcome  vectors, 
c 

c  This  program  is  written  to  run  in  the  interactive  mode  but 
c  it  can  be  run  batch  mode  by  reading  k,  the  desired  alpha  level 
c  and  p(i) ,  i=l,k  from  one  file  and  writing  the  results  in 
c  another  file.  For  example,  vul . e  <  data.inp  >  data. out  will 
c  read  input  from  a  file  named  data.inp  and  write  the  results 
c  into  a  file  called  data. out. 
c 

common  j out (4097 , 10)  ,prob(4097)  ,n,k 

double  precision  prob (4097) , cum(4097) 

dimension  p(12) 

read (5  ,  »)  k 

read  ( 5  ,  # )  dalp 

read (5,#) (p(i) , i= 1 ,k) 

epsilon=0. 0000000  1 

n=2#*k 

c  GENERATE  MATRIX  OF  ALL  POSSIBLE  OUTCOMES 
do  10  j  =  1 , n 
do  10  i =  1  ,  k 
j  out ( j  , i ) =0 
10  continue 

do  20  i= 1 , k 
ni=2## (k-i) 
n j  =2*ni 

do  20  nk=ni+l,n,nj 
do  20  nl=nk ,nk+ni- 1 
j  out (nl , i )  =  1 
20  continue 

write  (6 , 120) 

write(6,130)  (p(i)  ,i=l,k) 
wri  te  (6 , 140) 
write  (6 , 150) 
do  30  i  =  1  ,  n 
prob(i)  =  1. 
do  30  j  =  1  ,  k 

prob(i)  =  prob ( i ) #p ( j )## (j out ( i , j ))*( 1 . -p (j ))##( 1 -j out ( i , j ) ) 
30  continue 
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c  ORDER  ALL  OUTCOMES  BY  PROBABILITY,  FROM  LOWEST  TO  HIGHEST 
do  50  j  =  I , n- 1 

do  50  m= j  +  1 , n 

if  (prob ( j ) . gt . prob (m) )  then 
do  40  i  =  1  ,k 
isave  =  j  out ( j  , i ) 
j  out (j  ,i)=jout(m,i) 
j  out (m, i) =isave 
40  continue 

save=prob  (  j ) 
prob ( j ) =prob (m) 
prob (m) =save 
endi  f 

50  continue 

c  CALCULATE  CUMULATIVE  DISTRIBUTION  FUNCTION 
cum( 1) =prob ( 1) 
do  60  j=2,n 

cum( j ) =cum( j - 1 ) +prob ( j ) 

60  continue 

do  70  i=  1  ,  n 

write(6,160)i,prob(i) , cum (i),(jout(i,j),j=l,k) 

70  continue 

c  DETERMINE  REJECTION  REGION 
irr=n 

80  irr=irr-l 

if  (cuffl( irr) . ge . dalp)  goto  80 

if  (prob ( irr  + 1 ) -prob  (  irr)  .  1 1 . epsi Ion)  goto  80 
talp=cum( irr) 

c  OUTPUT  REJECTION  REGION  VECTORS 
write (6 , 170) irr 
do  110  i=  1  ,  irr 

write(6,180)  ( j  out (i , j )  , j  =  1 , k' 

110  continue 

write(6,190)talp 

120  format(’The  input  probabilities  are:’) 

130  format ( 12f 6 . 3) 

140  formatt//'  Vector  Prob.  Cura. Prob.  Vector’) 

150  format(’  No.  '/) 

160  format(i6,2x,el0,5,fl0.6,2x,lli2) 

170  format(/’The  rejection  region  consists  of  these  ’,i3,’ 

180  f ormat (4x , 1 li2) 

190  forraat(/’The  true  alpha  level  is  ’,f6.3) 
stop 
end 


vectors  :  ’  / ) 
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